CS184 Homework 1 Write Up
Name: Jansen Samosa
GitHub repository: https://github.com/cal-cs184-student/sp25-hw1-jps
In this assignment, I implemented a rasterization pipeline for rendering SVG files, starting with basic rasterization of 2D triangles by utilizing the three point-in-triangle sampling method. To improve image quality, I implemented supersampling for antialiasing by rendering at a higher resolution and then downsampling to smooth out jagged edges. I also worked on transformations, applying translation, scaling, and rotation matrices to manipulate objects in the scene. For texture mapping, I used barycentric coordinates to interpolate UV coordinates across triangles and implemented two pixel sampling methods - nearest and bilinear filtering - to improve texture appearance while maintaining performance. Additionally, I incorporated level sampling with mipmaps to reduce aliasing even further, comparing nearest-level and bilinear interpolation techniques. Finally, I analyzed the tradeoffs between these different antialiasing methods in terms of speed, memory usage, and visual fidelity, experimenting with different combinations to find the best balance between performance and image quality.
Approach and Implementation:
My approach to implementing rasterization of single-colored triangles is based on the following steps:
max(...)/min(...)x and y values out of the three vertices and then flooring those values, which gives us the dimensions for this bounding box.(x1 - x0) * (y2 - y1) - (x2 - x1) * (y1 - y0) < 0.verts_x[3] and verts_y[3] and reversed them if they were clockwise.(0.5, 0.5)) and draw the pixel with the given color, if it passes the tests.Walk through how you rasterize triangles in your own words.
Rasterizing triangles in this assignment is done by sampling each pixel in the bounding box containing the triangle, to determine if that pixel should be drawn to the screen.
More specifically, sampling a pixel includes checking that the pixel passes three point-in-triangle tests. At a high-level, the point-in-triangle test takes one of the edges of the triangle Pi -> Pj and checks that the pixel lies on the side of the line segment/edge that corresponds to the inside of the triangle. More specifically, we can model each edge as an line in 2D space, which is boundless and affine. This line splits the bounding box into two halves - a "negative half-space" and a "positive half-space" - with the positive half-space always corresponding to the inside of the edge, as long as we compute our line function with counter-clockwise winding vertices (so that the direction of the normal vector is always pointing into the triangle). So, we perform this test three times, once for each edge, to determine if the pixel in the positive/inside side of each edge (or on the edge itself) which is equivalent to the pixel being inside the triangle and therefore should be drawn to the screen.
An additional detail to consider is what coordinate we consider (and therefore sample) each pixel as. To have an accurate rasterization of the triangle, it is important to sample each pixel at its center i.e. we sample pixel (x, y) at position (x + 0.5, y + 0.5)... Otherwise we may end up with a slightly shifted representation of the triangle.
A point-in-triangle test can be created in code as follows:
// the following runs a point-in-triangle test for pixel (x, y) given an edge P_i -> P_(i+1)
float dx, dy, L
dx = verts_x[i + 1] - verts_x[i];
dy = verts_y[i + 1] - verts_y[i];
L = -((x + 0.5) - verts_x[i]) * dy + ((x + 0.5) - verts_y[i]) * dx;
if (L < 0) continue;
Explain how your algorithm is no worse than one that checks each sample within the bounding box of the triangle. The bounding box of the triangle is defined as the smallest rectangle that can be drawn whilst ensuring that the entire triangle is within it.
...
int x_max = floor(std::max({x0, x1, x2}));
int x_min = floor(std::min({x0, x1, x2}));
int y_max = floor(std::max({y0, y1, y2}));
int y_min = floor(std::min({y0, y1, y2}));
...
for (int x = x_min; x <= x_max; ss_x++) {
for (int y = y_min; y <= y_max; ss_y++) {
// sample pixel (ss_x + 0.5, ss_y + 0.5)
}
}
...
My algorithm creates a bounding box by finding the maximum and minimum x and y values for the triangle. These values represent the dimensions and position of this bounding box and the code only iterates through the pixels in this bounding box.
Show a png screenshot of basic/test4.svg with the default viewing parameters and with the pixel inspector centered on an interesting part of the scene.
Approach and Implementation:
My approach to implementing supersampling in my rasterizer builds off the previous section. In my triangle rasterization function rasterize_triangle, I keep everything the same except that I scale the input vertices at the beginning of the function by the sqrt(sample_rate). I also make sure that any instances of width and height is replaced with width * sqrt(sample_rate) and height * sqrt(sample_rate). With these changes, the process of sampling the triangle remains identical, but just at a higher resolution than the framebuffer (specifically, scaled by the sample_rate).
However, now that the sample buffer and the frame buffer are at different resolutions, the resolve_to_framebuffer function must be modified to properly downscale the sample buffer back down to the frame buffer's resolution.
My approach to modifying resolve_to_framebuffer was to think of each frame buffer pixel as a collection of sample_rate number of pixels in the sample buffer that must be averaged into a single pixel. E.g. if sample_rate is 9, there are 9 samples corresponding to each frame buffer pixel... So, I loop through each frame buffer pixel, and in each iteration I convert the frame buffer coordinate to its respective bottom-left sample buffer pixel (by scaling coordinate by sqrt(sample_rate))). Then, I loop through all sample_rate samples and average the color values, and input the resulting color into the frame buffer.
Some additional things to note is that the rasterize_point function had to be modified to make sure its drawing to all the pixels in the sample buffer that corresponds to the single frame buffer pixel coordinate that is passed into the function. Also set_sample_rate had to be modified so that the sample buffer is properly resized each time the supersample resolution is changed.
Walk through your supersampling algorithm and data structures. Why is supersampling useful? What modifications did you make to the rasterization pipeline in the process? Explain how you used supersampling to antialias your triangles.
The main change to the rasterization pipeline is that the sample buffer and the frame buffer are now different sized buffers, and the code must be modified to support this change.
Data structures:
std::vector<Color> sample_buffer; is a width * height * sample_rate sized buffer that holds all the sampled data necessary to form our render. This buffer by itself cannot be passed to the GPU and must first be downscaled to the proper resolution before doing so. This buffer can also be thought of as a high resolution version of what will be rendered once it gets downscaled.unsigned char* rgb_framebuffer_target; is a width * height buffer of rgb pixels that gets sent to the GPU and rendered onto the screen.sqrt(sample_rate) which converts a frame buffer pixel coordinate to a sample buffer pixel coordinate, or converts a dimension such as width or height to the corresponding width or height for the sample buffer. Often in my code you can see variables of the name ss_x, ss_y, ss_width, etc... which refer to a coordinate or dimension for the sample buffer.Algorithms:
rasterize_triangle, the algorithm for rasterization is effectively the same as in task 1 except that now it samples at a higher resolution, according to a set sample_rate/size of the sample buffer. This is done by first converting the coordinates/dimensions to their respective sample buffer coordinates/dimensions and then running the algorithm as normal.resolve_to_framebuffer, as the code loops through each (frame buffer) coordinate, it converts the coordinate to its sample buffer coordinate (specifically its "bottom left" pixel) and then loops and averages the pixels in the sqrt(sample_rate) by sqrt(sample_rate) sub-grid (corresponding to that frame buffer pixel), to determine the correct frame buffer pixel.Supersampling as an anti-aliasing technique is useful and effective because it smooths out hard edges (removes high frequencies) from the resulting image. In other words, it allows the rasterizer to create "soft edges" where the resultant color for a pixel is a gradient between the triangle's color and a color of a pixel next to the the triangle if, for example, a pixel is half and half inside/outside the triangle. This gives the renderer more "flexibility" to represent the image despite a low resolution (reducing aliasing), and also appears to our eyes/brains as "smoother" to look at.
Show png screenshots of basic/test4.svg with the default viewing parameters and sample rates 1, 4, and 16 to compare them side-by-side. Position the pixel inspector over an area that showcases the effect dramatically; for example, a very skinny triangle corner. Explain why these results are observed.
The above images test4.svg at sample rates 1, 4, and 16 (from left to right). From the pixel inspector, we can see that at sample rate 1 there is very noticable aliasing at the sharp piont of the red triangle. And even in the macro view (not just in the pixel inspector), the aliasing greatly effects the look of the shape.
At sample rate 4 we can see that there is less aliasing due to the blurring effect of supersampling, improving the percieved accuracy and look of the triangle. Specifically, we can see some pixels that are fully red, and some that are blended with the white backround to create pink-ish/transparent red pixels which - these pixels are formed because with supersampling, the triangle is rasterized at a high resolution and then downscaled to the proper resolution. In the case of 4x supersampling, means that groups of 4 pixels are averaged into a single pixel in the framebuffer. So if for example, a group of 4 supersample pixels consists of 3 white pixels and 1 red pixel, this represents the framebuffer pixel this group corresponds to, as being roughly 25% inside the triangle and those sample buffer pixels are averaged to a pink-ish framebuffer pixel. This occurs more often at the edges of the triangle, or at sharp corners of the triangle such as in the example above.
However, at sample rate 4 there is still some aliasing which can seen with the general inconsistent opacity of the red color (its opaque, then pinkish, then opaque again) which doesn't make sense given that it should be getting "sharper" as it reaches the vertex. If we increase the sample rate to 16 so that we sample at 16x the resolution, we can see an even greater improvement to anti-aliasing, due to calculating 16 samples per framebuffer pixel, allowing the flexibility to display finer details.
Approach and Implementation:
Translation matrix:
return Matrix3x3(
1, 0, dx,
0, 1, dy,
0, 0, 1);
Scaling matrix:
return Matrix3x3(
sx, 0, 0,
0, sy, 0,
0, 0, 1);
Rotation matrix:
double cos_temp = cos(radians(deg));
double sin_temp = sin(radians(deg));
return Matrix3x3(
cos_temp, -sin_temp, 0,
sin_temp, cos_temp, 0,
0, 0, 1);
Create an updated version of svg/transforms/robot.svg with cubeman doing something more interesting, like waving or running. Feel free to change his colors or proportions to suit your creativity. Save your svg file as my_robot.svg in your docs/ directory and show a png screenshot of your rendered drawing in your write-up. Explain what you were trying to do with cubeman in words.
I changed cubeman so that he is waving. I also gave him a cape and some very basic lighting/shading.
Approach and Implementation:
To implement rasterize_interpolated_color_triangle, I copy pasted my code from rasterize_triangle and added some additional code before drawing each pixel in the bounding box to calculate an interpolated color based off the color of each vertex and its the pixel's barycentric coordinates.
The additional block of code is as follows:
float alpha = (-(ss_x - x1) * (y2 - y1) + (ss_y - y1) * (x2 - x1)) / (-(x0 - x1) * (y2 - y1) + (y0 - y1) * (x2 - x1));
float beta = (-(ss_x - x2) * (y0 - y2) + (ss_y - y2) * (x0 - x2)) / (-(x1 - x2) * (y0 - y2) + (y1 - y2) * (x0 - x2));
float gamma = 1 - alpha - beta;
Color color = alpha * c0 + beta * c1 + gamma * c2;
sample_buffer[ss_y * width * sample_rate_sqrt + ss_x] = color;
Explain barycentric coordinates in your own words and use an image to aid you in your explanation. One idea is to use a svg file that plots a single triangle with one red, one green, and one blue vertex, which should produce a smoothly blended color triangle.
Barycentric coordinates is a way of representing the position of some coordinate within a triangle, relative to its vertices. More specifically, it defines a coordinate in terms how much its position is interpolated between each three vertices. Once this coordinate is calculated, we can use them to interpolate other values other than coordinate positions, such as colors, as can be seen in the above image.
For example, consider the above triangle. Let the top most vertex be
Additionally, the coordinates
Show a png screenshot of svg/basic/test7.svg with default viewing parameters and sample rate 1. If you make any additional images with color gradients, include them.
Approach and Implementation:
To implement rasterize_textured_triangle I copy and pasted my code from rasterize_interpolated_color_triangle and modified it to input the color returned from tex.sample(...) into the sample buffer for each pixel, instead of the interpolated color value from each vertex. However, I am still using barycentric coordinates here, except instead of interpolating colors, I am using them to determine the UV coordinate for each pixel/coordinate which is calculated as an interpolation between all three vertices UV coordinates.
Explain pixel sampling in your own words and describe how you implemented it to perform texture mapping. Briefly discuss the two different pixel sampling methods, nearest and bilinear.
In order to map a texture onto a surface, such as a triangle, each vertex of the shape must have a provided UV coordinate that tells the rasterizer where on the texture that vertex should maps to. (For example, a vertex having a UV coordinate (0.5, 0.5), means that a sampled pixel that is positioned exactly on that vertex should map to the (0.5 * tex.width, 0.5 * tex.height)'th texel, assuming the viewport and the texture are the same resolution and the shape is flat on the screen). And to sample pixels that aren't specifically at a vertex, I am using barycentric coordinates to interpolate the UV coordinates between the vertices to determine that pixel's uv coordinate, which is then passed to tex.sample(...) via a SampleParams object.
Texture sampling is the responsibility of the Texture object (i.e. the rasterizer just passes in the UV coordinate information and gets back a texel). The texture sampler must consider the fact that in the vast majority of cases, the texture will not appear the same resolution than the viewport when it is rendered to the screen, therefore there will be aliasing depending on the method used to map a UV coordinate to the texel (the direct mapping from screen space to texture space isn't necessarily an integer-element coordinate). Two methods include nearest filtering and bilinear filtering. With nearest filtering, the direct texture coordinate mapping is round()'d to the nearest texel coordinate - and with bilinear filtering a texel is generated as a linear interpolation (lerp) between the 4 nearest (actual) texels based on the exact position of the mapping.
Typically, nearest filtering method is much more prone to aliasing but the most performant method - while bilinear filtering effectively anti-aliases the texture (especially in the case of when the texture is magnified/larger than than the viewport resolution), while still being much more performant than just supersampling, but not to the degree as nearest filtering.
Check out the svg files in the svg/texmap/ directory. Use the pixel inspector to find a good example of where bilinear sampling clearly defeats nearest sampling. Show and compare four png screenshots using nearest sampling at 1 sample per pixel, nearest sampling at 16 samples per pixel, bilinear sampling at 1 sample per pixel, and bilinear sampling at 16 samples per pixel.
Above we can see the comparison between using nearest sampling with 1x supersampling (top-left), nearest sampling with 16x supersampling (top-right), bilinear sampling with 1x supersampling (bottom-left), and bilinear sampling with 16 supersampling (bottom-right).
In particular, nearest sampling with 1x supersampling appears to have the most aliasing, especially when focusing on the structural details of the campanile such as the clock hands, windows, and and ridges. Additionally, all three other combinations appear virtually identical - though if we were to nitpick, the color of the clock-hands in the bilinear 1x sampling example is very slightly more inconsistent and messy compared to bilinear 16x sampling - though the difference is basically non-existent unless you zoom in very far in (like in the pixel inspector). This tells us in this case, bilinear sampling (without supersampling) is a much more preferred method of anti-aliasing compared to supersampling since it is significantly more performant and achieves the same results.
Comment on the relative differences. Discuss when there will be a large difference between the two methods and why.
It is important to not however that the above comparison is of a texture zoomed in/magnified, which explains the effectiveness of bilinear sampling. When zooming out i.e. when the texture is minified, the effects are not very noticeable. The reason for this is because bilinear sampling works by sampling the 4 nearest texels.
In the magnification case, this is effective because there are more pixels in a single area than there are texels which results in us seeing each individual texel as a flat color when we aren't using bilinear sampling which makes the image appear jagged and "rough" - when we do use bilinear sampling, it allows the pixels to carry information from many surrounding texels so that there are smoother transitions in the image as a whole.
In the minification case however, there are more texels in a single area than there are pixels. If we use bilinear sampling here, the effect won't be as pronounced since - lets say there were 16 texels in the "footprint" of a single viewport pixel, if we only interpolate between 4 of those texels then it's unlikely that it will reduce aliasing by a significant amount since it most likely isn't resulting in any blurring/smoothing between other viewport pixels.
Explain level sampling in your own words and describe how you implemented it for texture mapping.
Level sampling is an anti-aliasing method for when the texture is minified. In this case, many texels needs to be represented by a single pixel - and so the basic idea of level sampling is that before we sample the texture for a pixel, we should downscale the image to a resolution so that the texel to pixel footprint is decrease, allowing us to get a better representation of the texture at that resolution. This reduces aliasing, and is very performant because usually the lower resolution version(s) of the texture are pre-computed. In actual implementation, there would be multiple "levels" of downscaled textures as well, and these are stored as a mipmap - where level 0 refer to the original (high resolution) texture, and the higher the level the more it is downscaled.
Once we have a mipmap, we must have a method of deciding which level is most appropriate to use depending for each pixel on the texture. Luckily this is simple with some observations and math. The problem is that many texels need to be represented by a single pixel - so the more texels that need to be represented by a single pixel "footprint", the lower resolution texture (higher level) that we need. More specifically, we can represent this footprint as the difference between UV coordinates from pixel to pixel in the triangle - the larger the difference the higher the level we need:
params.p_uv = compute_uv(ss_x, ss_y);
params.p_dx_uv = compute_uv(ss_x + 1, ss_y);
params.p_dy_uv = compute_uv(ss_x, ss_y + 1);
Above, the code calculates the UV coordinate for not just the pixel we are sampling, but also two adjacent pixels. These values are then used by the texture sampler to calculate the "footprint" of the sampled pixel i.e. the how many texels (of the original texture) are "in between" the sampled pixel and its adjacent pixels. These footprints are stored in diff_u and diff_v as local variables and are used to calculate the appropriate level:
// in Texture::get_level function
Vector2D diff_u = (sp.p_dx_uv - sp.p_uv) * width;
Vector2D diff_v = (sp.p_dy_uv - sp.p_uv) * height;
float D = max(sqrt(pow(diff_u.x, 2) + pow(diff_v.x, 2)),
sqrt(pow(diff_u.y, 2) + pow(diff_v.y, 2))
);
D = clamp(std::log2(D), (float) 0, (float) mipmap.size() - 1);
D is the exact level that the sampler will use. However, it is important to consider that D is a float value and, for example, can be computed as a value like 2.4. Mipmaps are precomputed, and the concept of "level 2.4" doesn't exist because levels 1 and levels 2 are 2 separate, distinct, textures.
To handle this, there are two techniques that are implemented in my rasterizer - nearest level, and bilinear level interpolation. Nearest level filtering is simple, we just round D to the nearest integer, and this works well and is good for performance as there is little to no extra overhead (apart from calculating D in the first place). Bilinear level interpolation however can be used for even more anti-aliasing - and it works by sampling from both levels that D is in between, and then lerp'ing between those two samples with a factor of the decimal part of D. Taking D = 2.4 for example - we sample from both mipmap level 2 and mipmap level 3, and then we return a 0.4 linear interpolation between those two samples.
I implemented bilinear level interpolation using the following code:
float t = D - floor(D);
int D1 = floor(D);
int D2 = D1 + 1;
Color sample_D1 = sample_once(D1);
Color sample_D2 = sample_once(D2);
return lerp(sample_D1, sample_D2, t);
Below, I included a visualization of nearest level filtering (left) and bilinear level interpolation filtering (right) on the campanile texture. The more black each pixel is in the visualization is, the lower (higher res) level that is being used. Pure black is level 0.
As can be seen in the images, the bilinear level interpolation, the levels are "smoothed" out as a gradient.
It is also important to note that level filtering techniques are not mutually exclusive with the texture sampling techniques discussed in the previous section as level filtering is just a way of choosing which mipmap level/resolution is most appropriate to sample from, but not the method for sampling from the texture.
You can now adjust your sampling technique by selecting pixel sampling, level sampling, or the number of samples per pixel. Describe the tradeoffs between speed, memory usage, and antialiasing power between the three various techniques.
In terms of pure anti-aliasing power if we compare these anti-aliasing techniques individually, supersampling is clearly the winner as it generates the most clear image with no compromises to the visual fidelity - however this comes at the cost of performance as it multiplies the number of samples needed to be taken each frame by factors of 4, 9, 16, etc... And these are expensive samples (more-so than when talking about texture filtering methods) since we waste many cycles on sampling pixels outside the triangle but in it's bounding box. Additionally, there is a significant memory usage increase since we must scale the size of the sample_buffer to store the higher resolution samples.
Level sampling - both nearest and bilinear - gives us a much more performant method of anti-aliasing compared to supersampling and provides very close results to it in most cases. Nearest level sampling in particular is very performant because it doesn't perform any extra samples and the only overhead is calculating the correct level D, bilinear level sampling doubles the number of samples when sampling a texture but helps deal with tricky "spots" that nearest level sampling still has trouble with.
However, there is a downside to level sampling which is that it appears to sacrifice visual fidelity. When quickly turning level sampling off and on in the viewport, its very noticeable that very tiny details on objects can disappear when it is on, and the whole image looks very slightly overly blurred. Another additional note is that using level sampling will require more memory usage than not using it due to having to store multiple versions of the texture. Though, this is a non issue as these textures are downscaled versions of the original image - at most the memory usage will only increase by 2x. When it comes to textures, level sampling seems to be much more "worth it" compared to supersampling if we were to choose between them in a real-time rendering setting, but for a pre-rendered image, supersampling makes more sense.
Bilinear pixel sampling has a unique case where it beats all other anti-aliasing techniques implemented in this assignment - which is in the case of a magnified image. In my testing, neither supersampling or level sampling had any effect when zoomed in on the campanile. However, bilinear pixel sampling wasn't very effective when the image is zoomed out, though the effect wasn't negligible. Additionally, bilinear pixel sampling does require us to sample 4 texels for every pixel, though this is not necessarily that detrimental to performance because each one of those 4 samples equate to one extra memory access into the texture buffer. Aside from the magnification case, supersampling is still better at anti-aliasing compared. And even with level sampling turned on, I could still see some extra anti-aliasing improvements when turning on bilinear pixel sample. Additionally, there is virtually no extra memory usage.
Finally, we can combine all three anti-aliasing methods. A note about performance in this case, is that the number of texture samples needed to be done compounds when using these together. For example, with 4x supersampling, bilinear level sampling, and bilinear pixel sampling, there are
From my testing, I found that some the best results were when I combined biliinear level sampling and bilinear pixel sampling i.e. trilinear texture filtering - this eliminated anti-aliasing while maintaining good enough performance. However, trilinear filtering still came with the "overly blurred" effect that bilinear level sampling did as well. Compared to supersampling, this was still much more performant.
Using a png file you find yourself, show us four versions of the image, using the combinations of L_ZERO and P_NEAREST, L_ZERO and P_LINEAR, L_NEAREST and P_NEAREST, as well as L_NEAREST and P_LINEAR.
Below you can see L_ZERO and P_NEAREST (top left), L_ZERO and P_LINEAR (top right), L_NEAREST and P_NEAREST (bottom left), as well as L_NEAREST and P_LINEAR (bottom right) of a skyline view. In particular, I noticed level filtering seems to especially help anti-alias of the buildings in the distance, and the effect of bilinear pixel sampling can be seen very apparently in on the windows of the building near the camera.